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On predictive probability matching priors 

Trevor J. Sweeting^ 

University College London 

Abstract: We revisit the question of priors that achieve approximate match- 
ing of Bayesian and frcquentist predictive probabilities. Such priors may be 
thought of as providing frcquentist calibration of Bayesian prediction or sim- 
ply as devices for producing frcquentist prediction regions. Here we analyse 
the 0(n~^) term in the expansion of the coverage probability of a Bayesian 
prediction region, as derived in [Ann. Statist. 28 (2000) 1414-1426]. Unhke 
the situation for parametric matching, asymptotic predictive matching priors 
may depend on the level a. We investigate uniformly predictive matching pri- 
ors (UPMPs); that is, priors for which this 0{n~^) term is zero for all a. It 
was shown in [Ann. Statist. 28 (2000) 1414-1426] that, in the case of quantile 
matching and a scalar parameter, if such a prior exists then it must be Jeffreys' 
prior. In the present article we investigate UPMPs in the multiparameter case 
and present some general results about the form, and uniqueness or otherwise, 
of UPMPs for both quantile and highest predictive density matching. 
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1. Introduction 

Prior distributions that match posterior predictive probabilities with the corre- 
sponding frcquentist probabilities are attractive when a major goal of a statistical 
analysis is the construction of prediction regions. Such priors provide calibration 
of Bayesian prediction or may be viewed as a Bayesian mechanism for producing 
frequentist prediction intervals. 

It is known that exact predictive probability matching is possible in cases in 
which there exists a suitable transformation group associated with the model. The 
general group structure for parametric models starts with a group of transforma- 
tions on the sample space under which the statistical problem is invariant. This 
group of transformations then gives rise to a group G of transformations on the pa- 
rameter space. From an "objective Bayes" point of view, it makes sense to choose 
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a prior distribution that is (relatively) invariant under this group. In particular, 
this will ensure that initial transformation of the data will make no difference to 
predictive inferences. The two fundamental invariant measures on the group G are 
the left and right Haar measures. The left (right) Haar measure is the unique left- 
(right-)translation invariant measure on G, up to a positive multiplicative constant. 
These measures give rise to invariant left and right Haar priors on the parameter 
space. In the decision-theoretic development, under suitable conditions it turns out 
that the right Haar prior gives rise to optimal invariant decision rules for invariant 
decision problems; sec, for example, [1]. The left Haar prior, however, which coin- 
cides with Jeffreys' invariant prior, often gives inadmissible rules in multiparameter 
cases. These facts provide strong motivation for the use of the right Haar prior. In 
relation to predictive inference, following earlier work in [10] and [11] this intuition 
was further reinforced in [13], where it was shown that if such a group structure 
exists then the associated right Haar prior gives rise to exact predictive matching 
for all invariant prediction regions. Thus the predictive matching problem is solved 
for models that possess a suitable group structure when the prediction region is 
invariant. 

When exact matching is not possible one can instead resort to asymptotic ap- 
proximation and investigate approximate predictive matching. This question was 
explored in [4] for the case of n independent and identically distributed (i.i.d.) ob- 
servations. For regular parametric families the difference between the frequentist 
and posterior predictive probabilities is 0{n~^) and a concise expression for this 
difference was obtained in [4] by using the auxiliary prior device introduced by P. 
J. Bickel and J. K. Ghosh in [•']]. This technical device has proved to be extremely 
valuable for the theoretical comparison of Bayesian and frequentist inference state- 
ments, or simply as a Bayesian device for obtaining frequentist results. It has been 
particularly useful for deriving probability matching priors (sec, for example, [i>], 
[9] and the review in [5]) and for studying properties of sequential tests ([IG])- 

In order to find an approximate predictive probability matching prior, one sets 
the 0{n~^) discrepancy to zero and attempts to solve the resulting partial differen- 
tial equation (PDE); a number of examples are given in [4]. We briefly review the 
main results in [4] in Section 2. Two main issues arise from this analysis. Firstly, 
the PDE for a predictive matching prior may be difficult to solve analytically. The 
second, and more fundamental, issue is that, except in special cases, the resulting 
matching prior will depend on the desired predictive probability level a. If there 
does exist a prior that gives rise to predictive probability matching for all a then 
we shall refer to it as a uniformly predictive matching prior (UPMP). Of course, in 
the case of a transformation model and an invariant prediction region we already 
know from [13] that the right Haar prior must be a solution of the PDE. It is in- 
structive to demonstrate this directly and this is done in the Appendix for quantile 
matching. Since the definition of the right Haar prior depends on a specific group 
of transformations on the parameter space, we need to study the effect of param- 
eter transformation on the quantities appearing in the PDE. For this reason, it is 
natural to regard Fisher's information, g, as a Riemannian metric tensor so that 
transformational properties of g and the other quantities that appear in the PDE 
can be studied. 

In the case of quantile matching and a single real parameter, it has already been 
shown in [4] that if there exists a UPMP then it must be Jeffreys' invariant prior. 
This result therefore extends the exact Haar prior result for transformation models 
to the most general models for which approximate uniform matching is possible. 
However, it is clear from examples discussed in [4] and from the general theory for 
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transformation models in [l:-!] that this result will not hold in the multiparameter 
case. For example, the unique UPMP for the normal model with unknown mean and 
variance is the right Haar prior, or Jeffreys' independence prior, whereas Jeffreys' 
prior is the left Haar prior. 

The main purpose of the present article is to investigate the general form of 
UPMPs whenever they exist. In particular, we explore the uniqueness or otherwise 
of the right Haar prior as a UPMP for quantile matching in the case of a transfor- 
mation model. Although UPMPs can exist outside of transformation models, such 
situations would seem to occur rarely. The main results are given in Section 3 and 4. 
In Section 3 we explore the form of the UPMP for quantile matching. In addition to 
confirming that the right Haar prior is a UPMP in suitable transformation models, 
as discussed above, we obtain the general form of the UPMP whenever one exists 
and show that this prior is unique. In particular, it follows that for transformation 
models there are no priors other than the right Haar prior that give approximate 
uniform predictive quantile matching. In Section 4 we consider probability match- 
ing based on highest predictive density regions, which are particularly relevant for 
multivariate data. The scalar parameter case is clear-cut and was essentially treated 
in [4], where it was shown that if there exists a UPMP then it is unique. However, 
unlike quantile matching, this UPMP is not necessarily Jeffreys' prior. The situ- 
ation is less straightforward in the multiparameter case. We show that, under a 
certain condition, if there exists a UPMP then it is unique. If this condition is not 
satisfied then either there will be no UPMP or there will exist an infinite number 
of UPMPs. This section provides predictive versions of results for highest posterior 
density regions obtained by J. K. Ghosh and R. Mukerjee in [H] and [9]. We end 
with some discussion in Section 5. 

2. Review of predictive probability matching priors 

We begin by introducing the notation and reviewing the main results in [4] on pre- 
dictive probability matching priors. We consider only the case of i.i.d. observations 
in this article, but the results would be expected to hold more generally under 
suitable conditions. Suppose then that Xi, X2, ... is a sequence of independent ob- 
servations having the same distribution as the (possibly vector-valued) continuous 
random variable X with density /(•; 9), where 9 = (9i, . . . , 9p) £ f2 is an unknown 
parameter and Q is an open subset of 5R^. Consider the problem of predicting the 
next observation, Xn+i, based on the first n observations, d = {Xi,X2, ■ ■ ■ ,Xn). 
We assume regularity conditions on / and tt, as detailed in [4]. In particular, the 
support of X is assumed to be independent of 9. 

Consider first the case of univariate X . Let q{Tr, a, d) denote the 1 — a quantile of 
the posterior predictive distribution of Xn+i under the prior tt. That is. (/(tt, a, d) 
satisfies the equation 



(2.1) P"(X„+i >(7(7r,a,rf)|d) 

Let q{9, a) be the 1 — a quantile of /(•; 9); that is 



= a. 



(2.2) 




Write dt = d/d9t and let ft{u; 9) = dtf{u; 9). Define 



(2.3) 
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Finally, let g{9) be the per observation Fisher information matrix, which we assume 
to be non-singular for all 6* e 17, and let gst and g'^* be the (s,t)th elements of g 
and g~^ respectively. 

Using the approach of [3] and [7] in which an auxiliary prior is introduced and 
finally allowed to converge weakly to the degenerate measure at 0, it follows from 
equations (3.3) and (3.4) in [4] that 

(2.4) Pg(X„+i > q{7r,a,d)) = a — h o(n ). 

7171(0) 

Here and elsewhere we use the summation convention. We will say that tt is a level-a 
predictive probability matching prior if it satisfies the equation 

(2.5) ds{g-'*{d)M0,a)7Tid)}^Q. 

From (2.4), such a prior tt matches the Baycsian and frequcntist predictive proba- 
bilities to o{n^^). Clearly, in general a solution of (2.5) will depend on the particular 
level a chosen. This is demonstrated in [4] for the specific example in which the 
observations are from a N{9,9) distribution. Recalling the discussion in Section 1, 
we refer to a prior for which (2.5) holds for all a as a uniformly predictive matching 
prior (UPMP). In the case p = 1, it was shown in [4] that if there exists a UPMP 
then this prior must be Jeffreys' prior. As noted in [4], when no UPMP exists 
then the formula on the left-hand side of (2.5) may still be useful for comparing 
alternative priors. 

Moving to the multiparameter case, examples in [4] illustrate that the above 
result on Jeffreys' prior no longer holds. An illustration of this is Example 2 in 
[4], which is the location-scale model f{x;0) = 02^f*i^2^i^ ~ ^i))- this case 
there exists a UPMP given by 7r(0) oc 0^^, which is the right Haar prior for this 
model under the location-scale transformation group, whereas Jeffreys' prior is the 
left-invariant prior n{6) cx under this group. 

In the case where X is possibly vector- valued, the coverage properties of highest 
predictive density regions are investigated in [4] . This investigation mirrors that in 
[8] and [9] for highest posterior density regions. Let m{9, a) be such that 



A 



f{u; 6)du ~ a, 
where A — A{9, a) — {u : f{u; 9) > m{9, a)} and define 



^ji9,a) = / f,iu;9)du 



A 

Let H{TT,a,d) be the level-a highest predictive density region under the prior tt. 
Then, as for quantile matching, it follows from the results in Section 5 of [4] that 

P(Y ds{g^\9M9,a)n{9)} , ^ 

Pe(X„+i e H(TT,a,d)) = a — h 0(71 ). 

mr[9) 

Thus TT is a level-a predictive probability matching prior if and only if it satisfies 
the equation 

(2.6) ds{g'\9)it{9,a)7r{9)}=0. 

Once again we see that in general the solution tt will depend on the level a. Examples 
are given in [4] in which there are no priors that satisfy (2.6) for all a. Moreover, 
even in the case p = 1, if there does exist a unique prior satisfying (2.6) for all a 
then it is not necessarily Jeffreys' prior. 
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3. UPMPs: quantile matching 

As discussed in Section 2 we know that when p = I and a UPMP exists for quantile 
matching as in (2.1), then it must be Jeffreys' prior. However, it need not be Jeffreys' 
prior whenp > 1. Under a suitable group structure on the model, the results in [13] 
imply that the associated right Haar prior gives exact predictive matching, since 
the prediction region here is invariant. Thus in these cases the right Haar prior 
must also be a solution of equation (2.5). It is instructive to demonstrate directly 
that this is indeed the case. 

First note from the product rule that equation (2.5) is equivalent to 



where X{9) ~ log7r(0). Suppose that there exists a group G of bijective transforma- 
tions on the sample space under which the statistical problem is invariant. Further 
assume, as in [13], that G = fl, a, locally compact topological group. In this case 
the distribution of X under 6 is the same as that of 9X under e, the identity ele- 
ment of the group, with regarded as an element of the transformation group G. 
Then there exist unique (up to a multiplicative constant) left-invariant and right- 
invariant Haar measures on G, giving left and right Haar priors on the parameter 
space. In the following we denote the right Haar prior density on fl by . The 
proof of the following theorem is given in the Appendix. 

Theorem 3.1. Under the above group structure the right Haar prior satisfies equa- 
tion (3.1). 

Two questions naturally arise. First, if the above group structure exists then 
can there be UPMPs other than the right Haar prior? The answer to this question 
turns out to be "no," as follows from Theorem 3.2 below. Second, if the above group 
structure does not exist can there still be a UPMP? The answer to this question is 
"yes." An example in the case p = 1 is given in Section 3 of [4] for which there is 
no suitable group structure but there is still a unique UPMP, which must of course 
be Jeffreys' prior. 

We now establish the general form of the UPMP whenever it exists and show that 
it is unique. This is a multiparameter version of Theorem 1 in [4]. Let F{x;9) be 
the distribution function of X, l{x; 9) = log/(x; 9) and write Fs{x] 9) ~ dsF{x; 9), 
ls{x;9) = ds\ogf{x;9). Define the functions 



where the integration is over the (common) support of F{x; 9). Finally write A'^ = 
logTT'', where t^'^ {9) oc |g(6')|^/^ is Jeffreys' prior. 

Theorem 3.2. Suppose that there exists a UPMP, n, for quantile matching. Then 
TT is the unique UPMP and the partial derivatives of X ~ logn are given by 



Proof. We begin by expressing g{9) in terms of the functions fj.t{9;a) defined at 
(2.3). By differentiation of equation (2.2) with respect to a we see that —f{q; 9)dq/ 
da = 1, while differentiation of equation (2.3) gives 



(3.1) 



g''{9)fit{9,a)dsX{9) + ds{g''{9)M0,a)} ^ 0, 



(3.2) 




(3.3) 



drX{9) ^dry{9) + hr[9). 



(3.4) 



dnj{9,a)/da = -~fj{q;9)dq/da = lj{q;9), 
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on substitution of dq/da from the previous relation. It follows that 

^ f dfii{9,a)\ f d^j{9,a) 



(3.5) g.,ie) = J k{q;eMq;e)fiq;e)dq ^ ^ ' j ' j da. 

Suppose that there exists a UPMP tt. Differentiation of equation (3.1) with 
respect to a and multiplication by dfir/da gives the equation 

oa oa oa [ oa ) 

Since this relation must hold for all < a < 1, integration over < a < 1 gives 

But from (3.5) the left-hand side of (3.6) is g^^gtrdsX = S^dsX = drX, where is 
the Kronecker delta function. Also, since dsig^^gtr) = ds{S!^) = 0, the product rule 
gives 



so that (3.6) becomes 



da J da 



This expression gives the partial derivatives of A = logTr and, furthermore, estab- 
lishes that 77 is the unique UPMP. We now show that this expression is equivalent 
to (3.3). 

We first obtain the partial derivatives of A"'. From a standard result for the 
derivative of a matrix determinant, we have 

SrA"' = ia^log|g| = ig^'a^g^t 

again using (3.5). The difference between the rth partial derivatives of A and A"' is 
therefore 

(3.7) drX ~ dr\' = .9^* / ^{^r^ls ~ d,ilr)^da. 

Jq oa oa 

Differentiation of (2.2) with respect to 6r gives, writing q — q{6,a), qrf{q;9) + 
pLr{0, ot) = 0, from which we obtain 

/>oo />oo 

drHs{0,a) = / frs{u;9)du- fs{q;0)qr = I frs{u;0)du - ls{q;9)fir{d,a). 

J q J q 

Furthermore, we have 

^{U{q-e)^jir{e,a)} - ^^^h^^,(e,a) + Uq;0)ir{q;e). 

Oa oa oq 
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It now follows from these two relations that 



(drUs - OsMr) = -5- 5 Ms n Mr 



i9q; da \ dq dq 

Substituting into equation (3.7) gives 

(3.8) hr = g^' f ( - ] kdq 



dq ^ dq 

on the change of variables from a to q, using equation (3.4) and on noting that 
fj.s{0,a{q,d)) = —Fs{q;0). Next note that the indefinite integral 

Fs{q; 9)^^^^^dq = F^q; e)lr{q; 9) - j ls{q; e)lr{q; e)dq 

from which it follows by an integration by parts that (3.8) is equivalent to (3.2), as 
required. □ 

In the case p = 1 we have /i^ = so the unique UPMP is Jeffreys' prior, as given 
in Theorem 1 of [4]. For the location-scale model f{x;0) — 02^f*i^2^i^ ~ ^1)) 
discussed in Section 1, it can be verified that the solution to (3.3) is t:{9) oc 82^, 
which is the right Haar prior for this model under the location-scale transformation 
group. In general a necessary condition for there to be a UPMP is that hr be a 
derivative field. The condition is not sufficient, however, as Jeffreys' prior always 
satisfies equation (3.3) in the case p — 1 but we know from [4] that Jeffreys' prior is 
not necessarily a UPMP. When p > 1 the condition that hr be a derivative field is 
a very strong one when the model is not transformational. We have been unable to 
construct a two-dimensional example that is not transformational and that satisfies 
this condition. Even given a model satisfying this condition, the resulting prior may 
still not satisfy (2.5) for all a. Thus it would seem that UPMPs rarely exist outside 
of transformation models. The major point of Theorem 3.2, however, is to show 
that if a UPMP docs exist then it is unique. 

Note that, whether or not a UPMP exists, when hr is a derivative field then 
(3.2) defines a unique prior tt which, from the proof of the Theorem 3.2, satisfies 

the equation (^^^^ {§^) ^'^ — where e{9,a) is the 0{n~^) error term (2.4). 
Assuming that dfir/da is well behaved at a = and a = 1, integration by parts 
shows that this is equivalent to 

-—^eda ^ 
da^ 

for all 9 and r. These relations give some sort of average prediction error, but it is 
unclear what precise interpretation can be given to them. 

Finally, when there exists a suitable group structure as discussed earlier then we 
know that diX must be diX^ . Furthermore, since Jeffreys' prior is the left Haar 
prior, it follows that h.i{9) = 9^ log A(6'~^), where A is the modulus of 51 and 9~^ 
is the group inverse of 9. 



4. UPMPs: highest predictive density region matching 



We consider now the case where X is possibly vector- valued. The question of the 
existence of UPMPs for highest predictive density regions in this case is not so 
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straightforward as the quantile matching case discussed in Section 3. In particular, 
if there exists a suitable group structure, as in Section 3, since the prediction region 
-ff (tt, a, d) defined in Section 1 is not invariant under transformation of X (unless 
the group is affine), the associated right Haar prior is not necessarily a UPMP. We 
also know that when a UPMP does exist it may not be unique. This was illustrated 
in Example 4 in [4] of the bivariate normal model with unknown covariance matrix; 
wc will return to this example in Example 1 below. 

The scalar parameter case is straightforward, however. For each a the prior 

is the unique solution to (2.6). It follows that there exists a UPMP prior if and only 
if S,i{d,a) = Q{6)R{a), as was noted in [4] where examples are given in which this 
condition does and does not hold. Unlike the case of quantile matching, however, 
the unique solution when it exists is not necessarily Jeffreys' prior. For example, 
in [4] it is shown that a unique UPMP exists for the N{0^ 6) model but this is not 
Jeffreys' prior. 

The multiparameter case is more difficult. The simplest situation is when ^t(0, a) 
is of the form 



(4.1) S,t{9,a)=Qt[e)R{a). 
Then every UPMP will be a solution of the Lagrange PDE 

(4.2) ds{g'\6)Qt{e)iT{6)] = Q. 

This equation may have no solutions or an infinite number of solutions. 

Example 1. Consider the bivariate normal model with zero means and unknown 
standard deviations (7i,G2 and correlation coefficient p. Let E be the covariance 
matrix of X. We work with the orthogonal parameterisation 



Oi 



where E — TT' and T is the left Cholesky square root of E. It can then be shown 
that the information matrix is 

Furthermore, by transforming to Z = T^^X , it can be shown that 

m(0,a) = 0i02(l-«)/(27r), i^{0 , a) ^ 9^^ R{a) , ^0, a) = 9^^ R{a), 6(^,a)=0, 

where R{a) = —(1 — a) log(l — a). Thus £,t{0, a) is of the form (4.1). Therefore the 
UPMP priors are all the solutions of the PDE (4.2) with Qi{e) = Q2{0) = 02^ 
and Q3{0) = 0. The general solution is found to be 

7t{0) ^0^^h{0^'0l,03), 

where h is an arbitrary positive function. Notice that the leading term is 
13(6*) 1 1/2, so Jeffreys' prior is a UPMP. In terms of (cti ,(j2,p) wc have 

Oi = (T-\ 02 = - p')-^'^ 03 = -pa^'a2 
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with Jacobian of transformation '^(1 - p2)-3/2^ ^iti^ suitable re-expression 

of h we find that 

(4.3) 7r(ai, (72, p) « 7r'^((7i, as, p)i^(CTr V2, (1 - p'f'^), 

where n'^ {cii, a2, p) oc crf^cTj ^(1 - p2)-3/2 jg Jeffreys' prior and H is an arbitrary 
positive function. This is a very wide class of priors. In particular, taking h{x, y) = 
x'^y^, we see that all priors of the form 7r'^((7i, (72, p)(o'r^C2)°(l — p^)^ are UPMPs. 
Taking a = 1,6 = 1/2 we obtain (Tj~^(l — P^)^^, which can be shown to be the 
right Haar prior arising from the group of transformations T^^X on the sample 
space, where T is a lower triangular matrix with positive diagonal elements. This 
group is isomorphic to and since in this case the region A is invariant it follows 
from [' ■] that this prior must be a UPMP. Similarly, all right Haar priors arising 
from transformations of the form T~^MX, with AI a fixed non-singular matrix, 
arc included in (4.3). 

We now return to the general analysis of equation (2.6). In Theorem 4.1 below, 
when we say that the functions S,t{0, a) are linearly independent we shall mean that 
they are linearly independent as functions of a for fixed 6. 

Theorem 4.1. Suppose that the functions ^f(0,a) are linearly independent and 
that there exists a UPMP, n, for highest predictive density region matching. Then 
n is the unique UPMP and the partial derivatives of X = logn are given by 

(4.4) 5,A = -5"g,, f'^djg^'^) da, 



da ' \ da 

where (b"{9)) is the inverse of the non-singular matrix function {bij{0)) with (i,j)th 
element 

M.^f(^)(^)- 

Proof. We begin by showing that the matrix {bij{9)) is non-singular for all 6' e 
if and only if the functions S,t{0,a) are linearly independent. From the definition 
(4.5), we see that in general {bij{6)) is positive semidefinite and is therefore singular 
for all 6* S 17 if and only if, for each 9, there exist functions x^{9), not all zero, for 
which bij{9)x'^{0)x^ {6) = 0. This is equivalent to the condition 



dx\e)(tie,ay ' 

da 



da — 0, 



which in turn holds if and only if d{x*{9)£^t {0iCt))/da = for all 9 and a. Since 
S,t{9, 1) = it follows that a necessary and sufficient condition for the singularity 
of {bij{9)) is the existence of x*{9), not all zero, such that x*{9)£,t{&,Ci) = for all 
9 and a. That is, the functions S^t{9,a) are linearly dependent. 

We now apply the product rule to (2.6) to give equation (3.1) with p,t replaced 
by ^t- Exactly as in the proof of Theorem 3.2, we differentiate this equation with 
respect to a, multiply by dS^r/da and integrate over < a < 1 to give 



g^'btrdsX + / -^ds g"'^ Ua = 0. 
7o da \ da J 

Finally, under the condition of the theorem the matrix (bij{9)) is non-singular 
and equation (4.4) follows on multiplying both sides of the above expression by 
b"g^j. □ 



Predictive matching priors 



55 



In the case p = 1 we know that a UPMP exists if and only if [6, a) = Q{9)R{a), 
in which case ^ 

bii{0) ^ {Q{e)y f {R{a)yda. 

JQ 

Equation (4.4) then becomes dX/dO = d\og{g-^e) / d9, giving 71(6*) oc {Q{e)]-'^ g{e) 
in agreement with the earher discussion. In the multiparameter case, unlike The- 
orem 3.2, there does not appear to be any simple further development of (4.4). 
Returning to the univariate location-scale model f{x;9) = 0^^/*(0^^(a; — Oi)), it 
can be verified that the functions ^t{0,oi) are linearly independent and, as in Sec- 
tion 3, that the right Haar prior t:{9) (x under the location-scale transformation 
group is the solution to (4.4). When p > \ the condition that the right-hand side of 
(4.4) be a derivative field is very strong when the model is not transformational and 
we have been unable to find a two-dimensional example that is not a transformation 
model satisfying this condition. Again, as in Section 3, even for such an example the 
resulting prior may still not satisfy (2.6) for all a. Thus it would seem that unique 
UPMPs rarely exist outside of transformation models. As with Theorem 3.2, the 
major point of Theorem 4.1 is to show that, under the conditions of the Theorem, 
if a UPMP does exist then it is unique. 

Note that when p > \ Theorem 4.1 docs not apply to the case (4.1) since the 
functions S^tifi^a) are linearly dependent and hence the matrix {bij{9)) is singular. 
A more general sufficient condition for linear dependence of the ^t(0, a) is 

(4.6) ^t{0,a) = Ut{9)S{d,a). 

Note that this is also a necessary condition for linear dependence in the case p — 2. 
Suppose that (4.6) holds and that there exists a UPMP tt. Then from equation 

(2.6) we see that 

g^'UtdsX + g^'Utd, \ogS + d,{g''Ut) = 

for all a, which implies that the function g'^*{6)Ut{0)ds\ogS{6,a) must be free 
from a. Since no boundary conditions arc imposed on the solutions to the resulting 
Lagrangian PDE, it follows that tt must be one of an infinite number of solutions. 
Thus, under condition (4.6), either there is no UPMP or there is an infinite number 
of UPMPs. Note that (4.1) is a special case of (4.6). 

It might appear at first sight that it is also possible to have an infinite number 
of UPMPs in the case of quantilc matching, which would contradict the result of 
Theorem 1. However, using a parallel argument to that given above, we see that the 
structure (4.6) for fit {d, a) cannot occur, as this would imply singularity of Fisher's 
information matrix. 

Finally, the case 

(4.7) ^ti0,a) = Qt{0)Rt{a), 

which is a generalisation of the simple case (4.1), is of some interest. It is easily 
seen that in this case the linear independence of the functions ^((6*, a) is equivalent 
to the linear independence of the functions Rt{a). Furthermore, the matrix {bij{9)) 
will be non-singular for all 9 if and only if the matrix with (i, j)th element 




is positive definite. This turns out to be the case for the location-scale models 
discussed earlier. 
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Example 2. Consider the multivariate location model with density /(x; 6) = 
/*(.Ti —9i,...,Xp — Op). The region A here is invariant under the group of transfor- 
mations .T + a, a G TiP and it follows from [ I '■'>] that the right Haar prior is an exact 
UPMP. Here the right Haar prior is also Jeffreys' prior, both being constant. We 
now investigate conditions under which this is the unique UPMP. As in [4], we find 
that m{9,a) = 777(0;), free from 6, and ^4(6*, a) = Rt{ce), which is of the form (4.7) 
with Qt{0) ~ 1 for all t. It follows from the above discussion that Jeffreys' prior is 
the unique UPMP if and only if the functions Rt{a) are linearly independent. In 
that case, since both 5*'* and are free from 6, the right-hand side of (4.4) is zero 
and, again, the unique UPMP is the uniform prior. 

For many standard models, however, the functions Rt{oi) will be linearly de- 
pendent. Suppose, for example, that /* is elliptically symmetric, so that f*{z) = 
H{z'Cz) for some positive definite matrix C. Then it can be checked that Rt{a) = 
QtR{a), which is of the form (4.1) with Qt free from 9. The functions Rt{a) are 
clearly linearly dependent and hence, since we know that there exists at least 
one UPMP, there will be an infinite number of UPMPs. For example, in the 
case where /* is spherically symmetric, we have Qt — Q and the Lagrange PDF 
(4.2) becomes ^^dgX = 0. The solutions of this equation are of the form 7r(6') oc 
cxp{/i(02 — ^1 , • ■ • ~ ^1)}, where h is an arbitrary function. In particular, all priors 
of the form 7r(0) oc exp(^j aiOi) with ai — Q will be uniformly matching in this 
case. 

A similar analysis may be carried out for the multiparameter location-scale model 
with different location parameters, as described in [4]. Whether or not the scale pa- 
rameters are assumed to be equal, there is an appropriate group of transformations 
for which the corresponding right Haar prior will be a UPMP. In either case S,t{d, a) 
is again of the form (4.7) so that whether or not the right Haar prior is the unique 
UPMP will depend on the linear independence or otherwise of the functions Rt{a). 

When the model has no suitable group structure, we conjecture that the functions 
S,r{Q, o) will always be linearly independent. To see the plausibility of this, note that 
the ^r(^) ex) are linearly dependent if and only if there exist functions x^{9), not all 
zero, such that J^{x*{9)lt{x;9)}f{x;9)dx = for all 9 and a. Since the density 
f{x; 9) cannot be standardised by transformation, the only way that this would 
seem to be possible is if x*{9)lt{x;9) — for all theta. However, it is easily seen 
by partial differentiation w.r.t 9g that this condition leads to g being singular. This 
analysis therefore suggests that if the model is not transformational then there will 
either be no UPMP or a unique UPMP, which is then given by (4.4). 

5. Discussion 

Although it is known that exact matching of invariant prediction regions is achieved 
by the right Haar prior under a suitable group structure on the model, we have 
seen in Section 3 that there can be other priors that achieve approximate uniform 
predictive quantile matching, and that uniformly matching priors can exist when 
there is no suitable group structure, although these are rare. In common with other 
work on probability matching priors, predictive matching priors arise as solutions 
to a particular PDF, which in general can be very difficult to solve. However, in the 
case of uniform quantile matching, if a UPMP exists then it is unique and explicit 
formulae for its partial derivatives are available from Theorem 3.2. 

Fxcept in special cases, derivation of the UPMP for quantile matching via equa- 
tion (3.3), or even verifying that the derivatives in (3.3) are consistent, will be 
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intractable. An attractive alternative would be to use a data-dependent approxi- 
mation of the UPMP based on a local prior of the form 

drXie,9a) =drX-'i9) + hri9a). 

See [14] for a derivation of data-dependent matching priors for marginal posterior 
distributions. Furthermore, since a data-dependent prior of this form will always 
exist, there may be cases for which it will be uniformly matching even when there 
is no a-free solution of (2.5). Although the posterior distribution arising from such 
a prior would not always have a strict Bayesian interpretation, use of the corre- 
sponding predictive distribution could provide a useful mechanism for constructing 
frequentist prediction regions with good coverage properties. It would be of inter- 
est to conduct simulation experiments in order to assess the predictive coverage 
afforded by such priors. 

The case of highest predictive density regions is more complex. As discussed 
in Section 4, there will cither be a unique solution or else there will be infinitely 
many solutions, depending on the linear independence or otherwise of the functions 
Of). Thus in any particular example it is necessary to examine carefully the 
structure of the functions Q?). If the statistical model has a suitable group 
structure then this task is usually eased. One could also investigate local priors 
when the matrix {bij{6)) is invertible. 

In the case of univariate observations, the results provide some guidance on the 
choice of objective prior if the main goal is to carry out Bayesian prediction and 
low predictive coverage probability bias is desired. In relation to the determination 
of an objective prior, for multivariate data the situation is less clear. When the 
functions (,r{9, a) are linearly dependent, as often occurs in transformation models, 
there will usually be an infinite number of UPMPs. Thus other considerations will 
need to be invoked in order to narrow down the choice of prior. For example, one 
might consider priors that are simultaneously predictive and posterior probability 
matching, reference priors ([2]) or priors that are minimax under suitable decision 
rules; in particular, for minimax prediction loss see, for example, [12] and [15]. 

Appendix: Proof of Theorem 3.1 

Proof. Let a £ fl and consider the transformation (j> = aO. Let J{9,a) ~ d<f>/da be 
the Jacobian matrix of this transformation for fixed 9. Then the right Haar prior is 
TT^ (9) oc \J{9,e)\~^ , where \J{9,a)\ is the determinant of J {9, a); see, for example. 

Write 4>g{9, a) — d(j)r{9, a) /dag and define (j)g{9) ~ 4)l{9, e), where e is the identity 
element of the group, so that tt^ (9) oc \{(f)l{9))\~^ . Finally, let (a^(0)) be the matrix 
inverse of (0^(0)). A standard result for the derivative of a matrix determinant then 
gives 

(5.1) dsX"i9) = ~a!^{9)ds^:{9), 

where (9) = log7r^(6'). 

Define (jfg{9,a) = ds4>r = d^rld9s: with matrix inverse 9%{9,a) — d9s/d(j)r- 
Since the definition of the right Haar prior depends on a specific group of transfor- 
mations on the parameter space, it is natural to regard Fisher's information as a 
Riemannian metric tensor associated with the diffcrentiable manifold of probability 
densities /(■; 6*), 9 G VI. This facilitates the study of the transformational properties 
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of the quanitities g^*{0) and /it(6',a) appearing in the PDE (3.1). First, from the 
invariance of the problem under G and the contravariant tensorial property of g''* , 
we have = g^^{6)(f)\4)l, where g^^ is the inverse Fisher information in the (j)- 

parameterisation. Again using the invariance properties, it is seen that pLj{(f),a) = 
^JLk{S^ ce)Sj, where p.j{(p, a) is the function (2.3) in the 0-parameterisation. Now write 
u"(6',a) = g''\e)iit{0,a) and u'{(j),a) = g''\(j))fit{(j),a). Then 

72X0, a) = g'\e)^ik{e,a)fAe^, 

(5.2) = g^\e)^ik{e,a)fA ^u''{e,a)ci^„ 

where 5^ is the Kronecker deha function. 

Now differentiate both sides of (5.2) with respect to to give 

(5.3) dsu' ((/), a)4,l {d, a) = u' {6, a)d<f>', {9, a) jdar = u' (9, a)ds4>\.{9, a) . 
Finally, setting a = e and multiplying both sides of (5.3) by a\{9) gives 

(5.4) dsu\9,a)a'l{9)^m = u'{9,a)al{9)ds^'^{9). 

Since {a^^{9)) is the matrix inverse of {(j)'^{9)), the left-hand side of (5.4) is 
dsu'-{9,a)6f ~ dsu''{9,a), whereas the right-hand side is —u''{9,a)dsX^ (9) from 
(5.1). It follows that the right Haar prior tt^ is a solution of equation (3.1) and 
hence of equation (2.5). □ 
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